-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time series based workload desc order optimization through reverse segment read #7244
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: gashutos <[email protected]>
Signed-off-by: gashutos <[email protected]>
Signed-off-by: gashutos <[email protected]>
Signed-off-by: gashutos <[email protected]>
Signed-off-by: gashutos <[email protected]>
gashutos
requested review from
reta,
anasalkouz,
andrross,
Bukhtawar,
CEHENKLE,
dblock,
gbbafna,
setiah,
kartg,
kotwanikunal,
mch2,
nknize,
owaiskazi19,
Rishikesh1159,
ryanbogan,
saratvemulapalli,
shwetathareja,
dreamer-89,
tlfeng,
VachaShah and
xuezhou25
as code owners
April 19, 2023 11:41
Signed-off-by: gashutos <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-7244-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 4c98b3d38064b94a87f6d7a1359e623849459bac
# Push it to GitHub
git push --set-upstream origin backport/backport-7244-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x Then, create a pull request where the |
austintlee
pushed a commit
to austintlee/OpenSearch
that referenced
this pull request
Apr 28, 2023
…gment read (opensearch-project#7244) This commit changes the EngineConfig for timeseries indexes only (e.g., indexes that use the @timestamp metadata field) so that a descending LeafSorter comparator is used to visit segments in order of most newest to oldest. For the more infrequent case that a user chooses to sort query results by ASC time, this would cause a search regression so the ContextIndexSearcher is updated to inspect the sort order from the search request and reverse the comparator so segments are visited in ascending order. LeafSorter behavior for non-timeseries indexes is left the same. Signed-off-by: gashutos <[email protected]> Signed-off-by: Chaitanya Gohel <[email protected]>
gashutos
added a commit
to gashutos/OpenSearch
that referenced
this pull request
May 8, 2023
…gment read (opensearch-project#7244) This commit changes the EngineConfig for timeseries indexes only (e.g., indexes that use the @timestamp metadata field) so that a descending LeafSorter comparator is used to visit segments in order of most newest to oldest. For the more infrequent case that a user chooses to sort query results by ASC time, this would cause a search regression so the ContextIndexSearcher is updated to inspect the sort order from the search request and reverse the comparator so segments are visited in ascending order. LeafSorter behavior for non-timeseries indexes is left the same. Signed-off-by: gashutos <[email protected]> Signed-off-by: Chaitanya Gohel <[email protected]>
reta
pushed a commit
that referenced
this pull request
May 8, 2023
…gment read (#7244) (#7457) This commit changes the EngineConfig for timeseries indexes only (e.g., indexes that use the @timestamp metadata field) so that a descending LeafSorter comparator is used to visit segments in order of most newest to oldest. For the more infrequent case that a user chooses to sort query results by ASC time, this would cause a search regression so the ContextIndexSearcher is updated to inspect the sort order from the search request and reverse the comparator so segments are visited in ascending order. LeafSorter behavior for non-timeseries indexes is left the same. Signed-off-by: gashutos <[email protected]> Signed-off-by: Chaitanya Gohel <[email protected]>
andrross
added a commit
to andrross/OpenSearch
that referenced
this pull request
Jun 2, 2023
…verse segment read (opensearch-project#7244)" This reverts commit 4c98b3d. Reverting due to issue reported in opensearch-project#7878.
andrross
added a commit
to andrross/OpenSearch
that referenced
this pull request
Jun 2, 2023
…verse segment read (opensearch-project#7244)" This reverts commit 4c98b3d. Reverting due to issue reported in opensearch-project#7878. Signed-off-by: Andrew Ross <[email protected]>
6 tasks
andrross
added a commit
to andrross/OpenSearch
that referenced
this pull request
Jun 2, 2023
…verse segment read (opensearch-project#7244)" This reverts commit 4c98b3d. Reverting due to issue reported in opensearch-project#7878. Signed-off-by: Andrew Ross <[email protected]>
andrross
added a commit
to andrross/OpenSearch
that referenced
this pull request
Jun 2, 2023
…verse segment read (opensearch-project#7244)" This reverts commit 4c98b3d. Reverting due to issue reported in opensearch-project#7878. Signed-off-by: Andrew Ross <[email protected]>
andrross
added a commit
that referenced
this pull request
Jun 2, 2023
…verse segment read (#7244)" (#7892) This reverts commit 4c98b3d. Reverting due to issue reported in #7878. Signed-off-by: Andrew Ross <[email protected]>
andrross
added a commit
that referenced
this pull request
Jun 2, 2023
…verse segment read (#7244)" (#7893) This reverts commit 4c98b3d. Reverting due to issue reported in #7878. Signed-off-by: Andrew Ross <[email protected]>
reta
pushed a commit
that referenced
this pull request
Jun 2, 2023
…tion through re… (#7895) * Revert "Time series based workload desc order optimization through reverse segment read (#7244)" (#7892) This reverts commit 4c98b3d. Reverting due to issue reported in #7878. Signed-off-by: Andrew Ross <[email protected]> (cherry picked from commit bb26536) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Remove unused imports Signed-off-by: Andrew Ross <[email protected]> --------- Signed-off-by: Andrew Ross <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Andrew Ross <[email protected]>
gashutos
added a commit
to gashutos/OpenSearch
that referenced
this pull request
Jun 8, 2023
…rough reverse segment read (opensearch-project#7244)" (opensearch-project#7892)" This reverts commit bb26536.
gashutos
added a commit
to gashutos/OpenSearch
that referenced
this pull request
Jun 8, 2023
…rough reverse segment read (opensearch-project#7244)" (opensearch-project#7892)" This reverts commit bb26536. Signed-off-by: gashutos <[email protected]>
6 tasks
andrross
pushed a commit
that referenced
this pull request
Jun 12, 2023
…gh reverse segment read (#7244)] with fixes (#7967) * Revert "Revert "Time series based workload desc order optimization through reverse segment read (#7244)" (#7892)" This reverts commit bb26536. Signed-off-by: gashutos <[email protected]> * Enable time series optimization only if it is not IndexSorted index, also ASC order reverse should only consider in @timestamp field Signed-off-by: gashutos <[email protected]> * Modifying CHANGELOG Signed-off-by: gashutos <[email protected]> * Adding integ test for scroll API where sort by _doc is getting early termination Signed-off-by: gashutos <[email protected]> --------- Signed-off-by: gashutos <[email protected]>
gashutos
added a commit
to gashutos/OpenSearch
that referenced
this pull request
Jun 13, 2023
…gh reverse segment read (opensearch-project#7244)] with fixes (opensearch-project#7967) * Revert "Revert "Time series based workload desc order optimization through reverse segment read (opensearch-project#7244)" (opensearch-project#7892)" This reverts commit bb26536. Signed-off-by: gashutos <[email protected]> * Enable time series optimization only if it is not IndexSorted index, also ASC order reverse should only consider in @timestamp field Signed-off-by: gashutos <[email protected]> * Modifying CHANGELOG Signed-off-by: gashutos <[email protected]> * Adding integ test for scroll API where sort by _doc is getting early termination Signed-off-by: gashutos <[email protected]> --------- Signed-off-by: gashutos <[email protected]>
gbbafna
pushed a commit
that referenced
this pull request
Jun 13, 2023
…gh reverse segment read (#7244)] with fixes (#7967) (#8037) Signed-off-by: gashutos <[email protected]>
gaiksaya
pushed a commit
to gaiksaya/OpenSearch
that referenced
this pull request
Jun 26, 2023
…tion through re… (opensearch-project#7895) * Revert "Time series based workload desc order optimization through reverse segment read (opensearch-project#7244)" (opensearch-project#7892) This reverts commit 4c98b3d. Reverting due to issue reported in opensearch-project#7878. Signed-off-by: Andrew Ross <[email protected]> (cherry picked from commit bb26536) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Remove unused imports Signed-off-by: Andrew Ross <[email protected]> --------- Signed-off-by: Andrew Ross <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Andrew Ross <[email protected]>
gaiksaya
pushed a commit
to gaiksaya/OpenSearch
that referenced
this pull request
Jun 26, 2023
…gh reverse segment read (opensearch-project#7244)] with fixes (opensearch-project#7967) (opensearch-project#8037) Signed-off-by: gashutos <[email protected]>
imRishN
pushed a commit
to imRishN/OpenSearch
that referenced
this pull request
Jun 27, 2023
…gh reverse segment read (opensearch-project#7244)] with fixes (opensearch-project#7967) * Revert "Revert "Time series based workload desc order optimization through reverse segment read (opensearch-project#7244)" (opensearch-project#7892)" This reverts commit bb26536. Signed-off-by: gashutos <[email protected]> * Enable time series optimization only if it is not IndexSorted index, also ASC order reverse should only consider in @timestamp field Signed-off-by: gashutos <[email protected]> * Modifying CHANGELOG Signed-off-by: gashutos <[email protected]> * Adding integ test for scroll API where sort by _doc is getting early termination Signed-off-by: gashutos <[email protected]> --------- Signed-off-by: gashutos <[email protected]> Signed-off-by: Rishab Nahata <[email protected]>
shiv0408
pushed a commit
to Gaurav614/OpenSearch
that referenced
this pull request
Apr 25, 2024
…gment read (opensearch-project#7244) This commit changes the EngineConfig for timeseries indexes only (e.g., indexes that use the @timestamp metadata field) so that a descending LeafSorter comparator is used to visit segments in order of most newest to oldest. For the more infrequent case that a user chooses to sort query results by ASC time, this would cause a search regression so the ContextIndexSearcher is updated to inspect the sort order from the search request and reverse the comparator so segments are visited in ascending order. LeafSorter behavior for non-timeseries indexes is left the same. Signed-off-by: gashutos <[email protected]> Signed-off-by: Chaitanya Gohel <[email protected]> Signed-off-by: Shivansh Arora <[email protected]>
shiv0408
pushed a commit
to Gaurav614/OpenSearch
that referenced
this pull request
Apr 25, 2024
…verse segment read (opensearch-project#7244)" (opensearch-project#7892) This reverts commit 4c98b3d. Reverting due to issue reported in opensearch-project#7878. Signed-off-by: Andrew Ross <[email protected]> Signed-off-by: Shivansh Arora <[email protected]>
shiv0408
pushed a commit
to Gaurav614/OpenSearch
that referenced
this pull request
Apr 25, 2024
…gh reverse segment read (opensearch-project#7244)] with fixes (opensearch-project#7967) * Revert "Revert "Time series based workload desc order optimization through reverse segment read (opensearch-project#7244)" (opensearch-project#7892)" This reverts commit bb26536. Signed-off-by: gashutos <[email protected]> * Enable time series optimization only if it is not IndexSorted index, also ASC order reverse should only consider in @timestamp field Signed-off-by: gashutos <[email protected]> * Modifying CHANGELOG Signed-off-by: gashutos <[email protected]> * Adding integ test for scroll API where sort by _doc is getting early termination Signed-off-by: gashutos <[email protected]> --------- Signed-off-by: gashutos <[email protected]> Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
EDIT -> After incorporating comments/suggestions from Nick, Andriya & Andorss, we decided to limit this change only to @timestamp field indices and also we changed default segment search order for these indices as Descending (from latest to the oldest segments), and also safe guarded rare ASC queries on time series based workload not to traverse on default DESC order to avoid any regression. The perf gains still remains similar.
As described in issue OpenSearch-6814, desc order sort queries are taking more time (3x or more) compare to asc order queries on same data set (time series based).
That is because all latest documents in time series workload are in latest segments when our IndexSearcher search documents in order of oldest segments to newest segments.
In this PR, we changed the leafReaderContext (segments) order in ContextIndexSearcher in reverse if query type is descending order sort. This will by default behaviour for all type of workloads and not just for time series based workloads. However it will only benefit to workload which has ever increasing data values. And for other type of workload, it wont regress or impact since data is randomly distributed across all segments.
But as a precaution, in case any user/workload gets search latency hit, I have added index level setting to disable this optimization.
Optimization gains
We are seeing 3x to 5x gains on below configuration.
Issues Resolved
OpenSearch-6814
Next action item
On same segment of time series based workload, ASC queiry is faster compare to descending order. We need to look further to that.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.